library(sf)
library(raster)
library(tidyverse)
library(spData)
Please note that most text and code in this notebook has been taken from Geocomputation with R book, authored by Robin Lovelace, Jakub Nowosad, and Jannes Muenchow. Several code lines have been changed to produce outputs and particularly for matching the G4D course purposes.
As spatial operations are a vital part of geomatics, this session shows how spatial objects can be modified in a multitude of ways based on their location and shape.
The content builds on the previous session because many spatial operations have a non-spatial (attribute) equivalent. This is especially true for vector operations.
Spatial operations differ from non-spatial operations in some ways, however. While, non-spatial joins depend on the existence of a common attribute, that is the key-column; spatial joins do not, as they take advantage of common location, that is the spatial relation between objects (do they crosses? do they overlap?).
Another unique aspect of spatial objects is distance. All spatial objects are related through space and distance calculations,can be used to explore the strength of this relationship.
Naturally, we can also subset rasters based on location and coordinates and merge different raster tiles into one raster. The most important spatial operation on raster data, however, is map algebra. Map algebra makes raster processing very elegant and fast.
This section provides an overview of spatial operations on vector geographic data represented as simple features in the sf package.
Spatial subsetting is the process of selecting features of a spatial object based on whether or not they in some way relate in space to another object. It is analogous to attribute subsetting and can be done with the base R square bracket ([) operator or with the filter() function from the tidyverse.
An example of spatial subsetting is provided by the nz and nz_height datasets in spData. These contain projected data on the 16 main regions and 101 highest points in New Zealand respectively.
print(nz)
## Simple feature collection with 16 features and 6 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 1090144 ymin: 4748537 xmax: 2089533 ymax: 6191874
## epsg (SRID): 2193
## proj4string: +proj=tmerc +lat_0=0 +lon_0=173 +k=0.9996 +x_0=1600000 +y_0=10000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## First 10 features:
## Name Island Land_area Population Median_income Sex_ratio
## 1 Northland North 12500.561 175500 23400 0.9424532
## 2 Auckland North 4941.573 1657200 29600 0.9442858
## 3 Waikato North 23900.036 460100 27900 0.9520500
## 4 Bay of Plenty North 12071.145 299900 26200 0.9280391
## 5 Gisborne North 8385.827 48500 24400 0.9349734
## 6 Hawke's Bay North 14137.524 164000 26100 0.9238375
## 7 Taranaki North 7254.480 118000 29100 0.9569363
## 8 Manawatu-Wanganui North 22220.608 234500 25000 0.9387734
## 9 Wellington North 8048.553 513900 32700 0.9335524
## 10 West Coast South 23245.456 32400 26900 1.0139072
## geom
## 1 MULTIPOLYGON (((1745493 600...
## 2 MULTIPOLYGON (((1803822 590...
## 3 MULTIPOLYGON (((1860345 585...
## 4 MULTIPOLYGON (((2049387 583...
## 5 MULTIPOLYGON (((2024489 567...
## 6 MULTIPOLYGON (((2024489 567...
## 7 MULTIPOLYGON (((1740438 571...
## 8 MULTIPOLYGON (((1866732 566...
## 9 MULTIPOLYGON (((1881590 548...
## 10 MULTIPOLYGON (((1557042 531...
print(nz_height)
## Simple feature collection with 101 features and 2 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 1204143 ymin: 5048309 xmax: 1822492 ymax: 5650492
## epsg (SRID): 2193
## proj4string: +proj=tmerc +lat_0=0 +lon_0=173 +k=0.9996 +x_0=1600000 +y_0=10000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## First 10 features:
## t50_fid elevation geometry
## 1 2353944 2723 POINT (1204143 5049971)
## 2 2354404 2820 POINT (1234725 5048309)
## 3 2354405 2830 POINT (1235915 5048745)
## 4 2369113 3033 POINT (1259702 5076570)
## 5 2362630 2749 POINT (1378170 5158491)
## 6 2362814 2822 POINT (1389460 5168749)
## 7 2362817 2778 POINT (1390166 5169466)
## 8 2363991 3004 POINT (1372357 5172729)
## 9 2363993 3114 POINT (1372062 5173236)
## 10 2363994 2882 POINT (1372810 5173419)
Let’s create an object representing Canterbury using attribute subsetting, then use spatial subsetting to return all high points in the region:
### this is atrribute subsetting
canterbury = nz %>% filter(Name == "Canterbury")
print(canterbury)
## Simple feature collection with 1 feature and 6 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 1325039 ymin: 5004766 xmax: 1686902 ymax: 5360239
## epsg (SRID): 2193
## proj4string: +proj=tmerc +lat_0=0 +lon_0=173 +k=0.9996 +x_0=1600000 +y_0=10000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## Name Island Land_area Population Median_income Sex_ratio
## 1 Canterbury South 44504.5 612000 30100 0.9753265
## geom
## 1 MULTIPOLYGON (((1686902 535...
## this is spatial subsetting
canterbury_height = nz_height[canterbury, ]
print(canterbury_height)
## Simple feature collection with 70 features and 2 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 1365809 ymin: 5158491 xmax: 1654899 ymax: 5350463
## epsg (SRID): 2193
## proj4string: +proj=tmerc +lat_0=0 +lon_0=173 +k=0.9996 +x_0=1600000 +y_0=10000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## First 10 features:
## t50_fid elevation geometry
## 5 2362630 2749 POINT (1378170 5158491)
## 6 2362814 2822 POINT (1389460 5168749)
## 7 2362817 2778 POINT (1390166 5169466)
## 8 2363991 3004 POINT (1372357 5172729)
## 9 2363993 3114 POINT (1372062 5173236)
## 10 2363994 2882 POINT (1372810 5173419)
## 11 2363995 2796 POINT (1372579 5173989)
## 13 2363997 3070 POINT (1373796 5174144)
## 14 2363998 3061 POINT (1373955 5174231)
## 15 2363999 3077 POINT (1373984 5175228)
library(tmap)
p_hpnz1 = tm_shape(nz) + tm_polygons(col = "white") +
tm_shape(nz_height) + tm_symbols(shape = 2, col = "red", size = 0.25) +
tm_layout(main.title = "High points in New Zealand", main.title.size = 1,
bg.color = "lightblue")
p_hpnz2 = tm_shape(nz) + tm_polygons(col = "white") +
tm_shape(canterbury) + tm_fill(col = "gray") +
tm_shape(canterbury_height) + tm_symbols(shape = 2, col = "red", size = 0.25) +
tm_layout(main.title = "High points in Canterbury", main.title.size = 1,
bg.color = "lightblue")
tmap_arrange(p_hpnz1, p_hpnz2, ncol = 2)
Like attribute subsetting x[y, ] subsets features of a target x using the contents of a source object y. Instead of y being of class logical or integer — a vector of TRUE and FALSE values or whole numbers — for spatial subsetting it is another spatial (sf) object.
Various topological relations can be used for spatial subsetting. These determine the type of spatial relationship that features in the target object must have with the subsetting object to be selected, including touches, crosses or within. Intersects is the default spatial subsetting operator, a default that returns TRUE for many types of spatial relations, including touches, crosses and is within. These alternative spatial operators can be specified with the op = argument, a third argument that can be passed to the [ operator for sf objects. This is demonstrated in the following command which returns the opposite of st_intersect(), points that do not intersect with Canterbury:
outside_height = nz_height[canterbury, , op = st_disjoint]
print(outside_height)
## Simple feature collection with 31 features and 2 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 1204143 ymin: 5048309 xmax: 1822492 ymax: 5650492
## epsg (SRID): 2193
## proj4string: +proj=tmerc +lat_0=0 +lon_0=173 +k=0.9996 +x_0=1600000 +y_0=10000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## First 10 features:
## t50_fid elevation geometry
## 1 2353944 2723 POINT (1204143 5049971)
## 2 2354404 2820 POINT (1234725 5048309)
## 3 2354405 2830 POINT (1235915 5048745)
## 4 2369113 3033 POINT (1259702 5076570)
## 12 2363996 2759 POINT (1373264 5175442)
## 25 2364028 2756 POINT (1374183 5177165)
## 26 2364029 2800 POINT (1374469 5176966)
## 27 2364031 2788 POINT (1375422 5177253)
## 46 2364166 2782 POINT (1383006 5181085)
## 47 2364167 2905 POINT (1383486 5181270)
library(tmap)
p_hpnz1 = tm_shape(nz) + tm_polygons(col = "white") +
tm_shape(nz_height) + tm_symbols(shape = 20, col = "red", size = 0.25) +
tm_layout(main.title = "High points in New Zealand", main.title.size = 1,
bg.color = "lightblue")
p_hpnz2 = tm_shape(nz) + tm_polygons(col = "white") +
tm_shape(canterbury) + tm_fill(col = "gray") +
tm_shape(outside_height) + tm_symbols(shape = 20, col = "red", size = 0.25) +
tm_layout(main.title = "High points not in Canterbury", main.title.size = 1,
bg.color = "lightblue")
tmap_arrange(p_hpnz1, p_hpnz2, ncol = 2)
For many applications this is all you’ll need to know about spatial subsetting for vector data.
Another way of doing spatial subsetting uses objects returned by topological operators. This is demonstrated in the code chunk below where an object of class sgbp (a sparse geometry binary predicate, a list of length x in the spatial operation) is created:
sel_sgbp = st_intersects(x = nz_height, y = canterbury)
sel_sgbp
## Sparse geometry binary predicate list of length 101, where the predicate was `intersects'
## first 10 elements:
## 1: (empty)
## 2: (empty)
## 3: (empty)
## 4: (empty)
## 5: 1
## 6: 1
## 7: 1
## 8: 1
## 9: 1
## 10: 1
Now, the sgbp will be converted into a logical vector sel_logical (containing only TRUE and FALSE values). The function lengths() will be used to identify which features in nz_height intersect with any objects in y. In this case 1 is the greatest possible value but for more complex operations one could use the method to subset only features that intersect with, for example, 2 or more features from the source object.
sel_logical = lengths(sel_sgbp) > 0
canterbury_height2 = nz_height[sel_logical, ]
print(canterbury_height2)
## Simple feature collection with 70 features and 2 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 1365809 ymin: 5158491 xmax: 1654899 ymax: 5350463
## epsg (SRID): 2193
## proj4string: +proj=tmerc +lat_0=0 +lon_0=173 +k=0.9996 +x_0=1600000 +y_0=10000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## First 10 features:
## t50_fid elevation geometry
## 5 2362630 2749 POINT (1378170 5158491)
## 6 2362814 2822 POINT (1389460 5168749)
## 7 2362817 2778 POINT (1390166 5169466)
## 8 2363991 3004 POINT (1372357 5172729)
## 9 2363993 3114 POINT (1372062 5173236)
## 10 2363994 2882 POINT (1372810 5173419)
## 11 2363995 2796 POINT (1372579 5173989)
## 13 2363997 3070 POINT (1373796 5174144)
## 14 2363998 3061 POINT (1373955 5174231)
## 15 2363999 3077 POINT (1373984 5175228)
Note that the canterbury_height2 objects is identical to the previouly obtained canterbury_height object.
identical(canterbury_height, canterbury_height2)
## [1] TRUE
It is noted that a logical can also be used with filter() as follows (sparse = FALSE is explained in section @ref(topological-relations)):
canterbury_height3 = nz_height %>%
filter(st_intersects(x = ., y = canterbury, sparse = FALSE))
print(canterbury_height3)
## Simple feature collection with 70 features and 2 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 1365809 ymin: 5158491 xmax: 1654899 ymax: 5350463
## epsg (SRID): 2193
## proj4string: +proj=tmerc +lat_0=0 +lon_0=173 +k=0.9996 +x_0=1600000 +y_0=10000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## First 10 features:
## t50_fid elevation geometry
## 1 2362630 2749 POINT (1378170 5158491)
## 2 2362814 2822 POINT (1389460 5168749)
## 3 2362817 2778 POINT (1390166 5169466)
## 4 2363991 3004 POINT (1372357 5172729)
## 5 2363993 3114 POINT (1372062 5173236)
## 6 2363994 2882 POINT (1372810 5173419)
## 7 2363995 2796 POINT (1372579 5173989)
## 8 2363997 3070 POINT (1373796 5174144)
## 9 2363998 3061 POINT (1373955 5174231)
## 10 2363999 3077 POINT (1373984 5175228)
At this point there are three versions of canterbury_height, one created with spatial subsetting directly and the other two via intermediary selection objects. To explore these objects and spatial subsetting in more detail, see the supplementary vignettes on subsetting and tidverse-pitfalls.
Topological relations describe the spatial relationships between objects. The Geographic Information Technology Training Alliance - GITTA illustrates topological relations.
To understand such relations, it helps to have some simple test data to work with. Figure @ref(fig:relation-objects) contains a polygon (a), a line (l) and some points (p), which are created in the code below.
# create a polygon
a_poly = st_polygon(list(rbind(c(-1, -1), c(1, -1), c(1, 1), c(-1, -1))))
a = st_sfc(a_poly)
# create a line
l_line = st_linestring(x = matrix(c(-1, -1, -0.5, 1), ncol = 2))
l = st_sfc(l_line)
# create points
p_matrix = matrix(c(0.5, 1, -1, 0, 0, 1, 0.5, 1), ncol = 2)
p_multi = st_multipoint(x = p_matrix)
p = st_cast(st_sfc(p_multi), "POINT")
par(pty = "s")
plot(a, border = "red", col = "gray", axes = TRUE)
plot(l, add = TRUE)
plot(p, add = TRUE, lab = 1:4)
text(p_matrix[, 1] + 0.05, p_matrix[, 2] - 0.08, 1:4, cex = 1.0)
A simple query is: which of the points in p intersect in some way with polygon a? The question can be answered by inspection (points 1 and 2 are over or touch the triangle). It can also be answered by using a spatial predicate such as do the objects intersect? This is implemented in sf as follows:
st_intersects(p, a)
## Sparse geometry binary predicate list of length 4, where the predicate was `intersects'
## 1: 1
## 2: 1
## 3: (empty)
## 4: (empty)
The contents of the result should be as you expected: the function returns a positive (1) result for the first two points, and a negative result (represented by an empty vector) for the last two. What may be unexpected is that the result comes in the form of a list of vectors. This sparse matrix output only registers a relation if one exists, reducing the memory requirements of topological operations on multi-feature objects. As we saw in the previous section a dense matrix consisting of TRUE or FALSE values for each combination of features can also be returned when sparse = FALSE:
st_intersects(p, a, sparse = FALSE)
## [,1]
## [1,] TRUE
## [2,] TRUE
## [3,] FALSE
## [4,] FALSE
The output is a matrix in which each row represents a feature in the target object and each column represents a feature in the selecting object. In this case only the first two features in p intersect with a and there is only one feature in a so the result has only one column. The result can be used for subsetting as we saw in section @ref(spatial-subsetting).
Note that st_intersects() returns TRUE for the second feature in the object p even though it just touches the polygon a: intersects is a ‘catch-all’ topological operation which identifies many types of spatial relation.
The opposite of st_intersects() is st_disjoint(), which returns only objects that do not spatially relate in any way to the selecting object (note [, 1] converts the result into a vector):
st_disjoint(p, a, sparse = FALSE)
## [,1]
## [1,] FALSE
## [2,] FALSE
## [3,] TRUE
## [4,] TRUE
st_within() returns TRUE only for objects that are completely within the selecting object. This applies only to the first object, which is inside the triangular polygon, as illustrated below:
st_within(p, a, sparse = FALSE)[, 1]
## [1] TRUE FALSE FALSE FALSE
Note that although the first point is within the triangle, it does not touch any part of its border. For this reason st_touches() only returns TRUE for the second point:
st_touches(p, a, sparse = FALSE)[, 1]
## [1] FALSE TRUE FALSE FALSE
What about features that do not touch, but almost touch the selection object? These can be selected using st_is_within_distance(), which has an additional dist argument. It can be used to set how close target objects need to be before they are selected. Note that although point 4 is one unit of distance from the nearest node of a (at point 2 in Figure @ref(fig:relation-objects)), it is still selected when the distance is set to 0.9. This is illustrated in the code chunk below, the second line of which converts the lengthy list output into a logical object:
sel = st_is_within_distance(p, a, dist = 0.9) # can only return a sparse matrix
lengths(sel) > 0
## [1] TRUE TRUE FALSE TRUE
Let’s check what is the distance between each point and the polygon:
st_distance(p, a, by_element = TRUE)
## [1] 0.0000000 0.0000000 1.0606602 0.7071068
# other tests
st_overlaps(p, a, sparse = FALSE)
## [,1]
## [1,] FALSE
## [2,] FALSE
## [3,] FALSE
## [4,] FALSE
st_covers(p, a, sparse = FALSE)
## [,1]
## [1,] FALSE
## [2,] FALSE
## [3,] FALSE
## [4,] FALSE
st_covered_by(p, a, sparse = FALSE)
## [,1]
## [1,] TRUE
## [2,] TRUE
## [3,] FALSE
## [4,] FALSE
p[1]
## Geometry set for 1 feature
## geometry type: POINT
## dimension: XY
## bbox: xmin: 0.5 ymin: 0 xmax: 0.5 ymax: 0
## epsg (SRID): NA
## proj4string: NA
## POINT (0.5 0)
# starting simpler so commented
a1 = st_polygon(list(rbind(c(-1, -1), c(1, -1), c(1, 1), c(0, 1), c(-1, 0.5), c(-1, -1))))
a2 = st_polygon(list(rbind(c(2, 0), c(2, 2), c(3, 2), c(3, 0), c(2, 0))))
a = st_sfc(a1, a2)
b1 = a1 * 0.5
b2 = a2 * 0.4 + c(1, 0.5)
b = st_sfc(b1, b2)
l1 = st_linestring(x = matrix(c(0, 3, -1, 1), , 2))
l2 = st_linestring(x = matrix(c(-2, -1.5, -2, 1.5), , 2))
l = st_sfc(l1, l2)
p = st_multipoint(x = matrix(c(0.5, 1, -1, -1.5, 0, 1, 0.5, 1.5), , 2))
plot(a, border = "green", col="forestgreen", axes = TRUE, main= "Polygons in a: green - in b: blue")
plot(b, border = "blue", col="dodgerblue", add = TRUE)
plot(l, col="purple", lwd=3.0, add = TRUE)
plot(p, col="red", pch=20, cex= 3,add = TRUE)
st_equals(b, b, sparse = FALSE)
## [,1] [,2]
## [1,] TRUE FALSE
## [2,] FALSE TRUE
st_contains(a, b, sparse = FALSE)
## [,1] [,2]
## [1,] TRUE FALSE
## [2,] FALSE FALSE
st_contains_properly(a, b, sparse = FALSE)
## [,1] [,2]
## [1,] TRUE FALSE
## [2,] FALSE FALSE
st_covers(a, b, sparse = FALSE)
## [,1] [,2]
## [1,] TRUE FALSE
## [2,] FALSE FALSE
st_covered_by(b, a, sparse = FALSE)
## [,1] [,2]
## [1,] TRUE FALSE
## [2,] FALSE FALSE
Joining two non-spatial datasets relies on a shared ‘key’ variable, as described in last session. Spatial data joining applies the same concept, but instead relies on shared areas of geographic space (it is also know as spatial overlay). As with attribute data, joining adds a new column to the target object (the argument x in joining functions), from a source object (y).
The process can be illustrated by an example using the world spatial object from last session. Imagine you have ten points randomly distributed across the Earth’s surface. Of the points that are on land, which countries are they in? Random points to demonstrate spatial joining are created as follows:
nworld <- world
print(nworld)
## Simple feature collection with 177 features and 10 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
## iso_a2 name_long continent region_un subregion
## 1 FJ Fiji Oceania Oceania Melanesia
## 2 TZ Tanzania Africa Africa Eastern Africa
## 3 EH Western Sahara Africa Africa Northern Africa
## 4 CA Canada North America Americas Northern America
## 5 US United States North America Americas Northern America
## 6 KZ Kazakhstan Asia Asia Central Asia
## 7 UZ Uzbekistan Asia Asia Central Asia
## 8 PG Papua New Guinea Oceania Oceania Melanesia
## 9 ID Indonesia Asia Asia South-Eastern Asia
## 10 AR Argentina South America Americas South America
## type area_km2 pop lifeExp gdpPercap
## 1 Sovereign country 19289.97 885806 69.96000 8222.254
## 2 Sovereign country 932745.79 52234869 64.16300 2402.099
## 3 Indeterminate 96270.60 NA NA NA
## 4 Sovereign country 10036042.98 35535348 81.95305 43079.143
## 5 Country 9510743.74 318622525 78.84146 51921.985
## 6 Sovereign country 2729810.51 17288285 71.62000 23587.338
## 7 Sovereign country 461410.26 30757700 71.03900 5370.866
## 8 Sovereign country 464520.07 7755785 65.23000 3709.082
## 9 Sovereign country 1819251.33 255131116 68.85600 10003.089
## 10 Sovereign country 2784468.59 42981515 76.25200 18797.548
## geom
## 1 MULTIPOLYGON (((180 -16.067...
## 2 MULTIPOLYGON (((33.90371 -0...
## 3 MULTIPOLYGON (((-8.66559 27...
## 4 MULTIPOLYGON (((-122.84 49,...
## 5 MULTIPOLYGON (((-122.84 49,...
## 6 MULTIPOLYGON (((87.35997 49...
## 7 MULTIPOLYGON (((55.96819 41...
## 8 MULTIPOLYGON (((141.0002 -2...
## 9 MULTIPOLYGON (((141.0002 -2...
## 10 MULTIPOLYGON (((-68.63401 -...
set.seed(2018) # set seed for reproducibility
(bb_world = st_bbox(nworld)) # the world's bounds
## xmin ymin xmax ymax
## -180.00000 -90.00000 180.00000 83.64513
random_df = tibble(
x = runif(n = 10, min = bb_world[1], max = bb_world[3]),
y = runif(n = 10, min = bb_world[2], max = bb_world[4])
)
random_points = random_df %>%
st_as_sf(coords = c("x", "y")) %>% # set coordinates
st_set_crs(4326) # set geographic CRS
print(random_points)
## Simple feature collection with 10 features and 0 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -158.1893 ymin: -42.98793 xmax: 165.1157 ymax: 80.53902
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## geometry
## 1 POINT (-58.98475 -21.30321)
## 2 POINT (-13.05962 25.39389)
## 3 POINT (-158.1893 80.53902)
## 4 POINT (-108.9239 27.7688)
## 5 POINT (-9.24689 49.9628)
## 6 POINT (-71.6225 20.12225)
## 7 POINT (38.43319 -42.98793)
## 8 POINT (-133.1956 6.009109)
## 9 POINT (165.1157 38.14241)
## 10 POINT (16.86582 53.84769)
plot(st_geometry(nworld), axes=T)
plot(st_geometry(random_points), col="red", pch=20, cex= 1, add=T)
The scenario is illustrated in Figure @ref(fig:spatial-join). The random_points object (top left) has no attribute data, while the world (top right) does. The spatial join operation is done by st_join(), which adds the NAME variable to the points, resulting in random_joined sf object.
#world$name_long = as.character(world$name_long)
world_random = nworld[random_points, ]
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
print(world_random)
## Simple feature collection with 4 features and 10 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -117.1278 ymin: -27.5485 xmax: 24.02999 ymax: 54.85154
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## iso_a2 name_long continent region_un subregion
## 28 MX Mexico North America Americas Central America
## 114 PL Poland Europe Europe Eastern Europe
## 157 PY Paraguay South America Americas South America
## 163 MA Morocco Africa Africa Northern Africa
## type area_km2 pop lifeExp gdpPercap
## 28 Sovereign country 1969480.3 124221600 76.75300 16622.597
## 114 Sovereign country 310402.3 38011735 77.60244 24347.074
## 157 Sovereign country 401335.9 6552584 72.91300 8501.544
## 163 Sovereign country 591719.0 34318082 75.30900 7078.881
## geom
## 28 MULTIPOLYGON (((-117.1278 3...
## 114 MULTIPOLYGON (((23.48413 53...
## 157 MULTIPOLYGON (((-58.16639 -...
## 163 MULTIPOLYGON (((-2.169914 3...
random_joined = sf::st_join(random_points, nworld, left=FALSE)
## although coordinates are longitude/latitude, st_intersects assumes that they are planar
print(random_joined)
## Simple feature collection with 4 features and 10 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -108.9239 ymin: -21.30321 xmax: 16.86582 ymax: 53.84769
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## iso_a2 name_long continent region_un subregion
## 1 PY Paraguay South America Americas South America
## 2 MA Morocco Africa Africa Northern Africa
## 3 MX Mexico North America Americas Central America
## 4 PL Poland Europe Europe Eastern Europe
## type area_km2 pop lifeExp gdpPercap
## 1 Sovereign country 401335.9 6552584 72.91300 8501.544
## 2 Sovereign country 591719.0 34318082 75.30900 7078.881
## 3 Sovereign country 1969480.3 124221600 76.75300 16622.597
## 4 Sovereign country 310402.3 38011735 77.60244 24347.074
## geometry
## 1 POINT (-58.98475 -21.30321)
## 2 POINT (-13.05962 25.39389)
## 3 POINT (-108.9239 27.7688)
## 4 POINT (16.86582 53.84769)
plot(st_geometry(world_random), axes=T)
plot(st_geometry(random_joined), col="forestgreen", pch=20, cex= 1, add=T)
By default, st_join() performs a left join, but it can also do inner joins by setting the argument left = FALSE. Like spatial subsetting, the default topological operator used by st_join() is st_intersects(). This can be changed with the join argument (see ?st_join for details).
In the example above, we have added features of a polygon layer to a point layer. In other cases, we might want to join point attributes to a polygon layer. There might be occassions where more than one point falls inside one polygon. In such a case st_join() duplicates the polygon feature, i.e. adds a new row to the polygon layer, for each multiple match.
Sometimes two geographic datasets do not touch but still have a strong geographic relationship enabling joins. The datasets cycle_hire and cycle_hire_osm, already attached in the spData package, provide a good example. Plotting them shows that they are often closely related but they do not touch, as shown with the following code below:
plot(st_geometry(cycle_hire), col = "blue")
plot(st_geometry(cycle_hire_osm), add = TRUE, pch = 3, col = "red")
Let’s review what these datasets are about:
print(cycle_hire)
## Simple feature collection with 742 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -0.2367699 ymin: 51.45475 xmax: -0.002275 ymax: 51.54214
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
## id name area nbikes nempty
## 1 1 River Street Clerkenwell 4 14
## 2 2 Phillimore Gardens Kensington 2 34
## 3 3 Christopher Street Liverpool Street 0 32
## 4 4 St. Chad's Street King's Cross 4 19
## 5 5 Sedding Street Sloane Square 15 12
## 6 6 Broadcasting House Marylebone 0 18
## 7 7 Charlbert Street St. John's Wood 15 0
## 8 8 Lodge Road St. John's Wood 5 13
## 9 9 New Globe Walk Bankside 3 16
## 10 10 Park Street Bankside 1 17
## geometry
## 1 POINT (-0.1099705 51.52916)
## 2 POINT (-0.1975742 51.49961)
## 3 POINT (-0.08460569 51.52128)
## 4 POINT (-0.1209737 51.53006)
## 5 POINT (-0.156876 51.49313)
## 6 POINT (-0.1442289 51.51812)
## 7 POINT (-0.1680743 51.5343)
## 8 POINT (-0.1701345 51.52834)
## 9 POINT (-0.09644075 51.50739)
## 10 POINT (-0.09275416 51.50597)
print(cycle_hire_osm)
## Simple feature collection with 540 features and 5 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: -0.229123 ymin: 51.45927 xmax: -0.0024191 ymax: 51.54683
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## First 10 features:
## osm_id name capacity cyclestreets_id
## 1 108539 Windsor Terrace 14 <NA>
## 2 598093293 Pancras Road, King's Cross NA <NA>
## 3 772536185 Clerkenwell, Ampton Street 11 <NA>
## 4 772541878 <NA> NA <NA>
## 5 781506147 <NA> NA <NA>
## 6 783824668 Finsbury Library, EC1 NA <NA>
## 7 783824677 Margery Street, WC1 NA <NA>
## 8 787313764 Chiswell Street, EC1 20 <NA>
## 9 814136668 George Place Mews 6 <NA>
## 10 820945916 Drury Lane 17 <NA>
## description geometry
## 1 <NA> POINT (-0.0933878 51.52913)
## 2 <NA> POINT (-0.1293092 51.53402)
## 3 <NA> POINT (-0.1182352 51.52729)
## 4 <NA> POINT (-0.090836 51.52583)
## 5 <NA> POINT (-0.1210572 51.53001)
## 6 <NA> POINT (-0.1038272 51.52594)
## 7 <NA> POINT (-0.1123251 51.52665)
## 8 <NA> POINT (-0.0898853 51.52086)
## 9 <NA> POINT (-0.158217 51.51682)
## 10 <NA> POINT (-0.122162 51.51474)
We can check if any points are the same st_intersects() as shown below:
any(st_touches(cycle_hire, cycle_hire_osm, sparse = FALSE))
## [1] FALSE
# included to show alternative ways of showing there's no overlap
sum(st_geometry(cycle_hire) %in% st_geometry(cycle_hire_osm))
## [1] 0
sum(st_coordinates(cycle_hire)[, 1] %in% st_coordinates(cycle_hire_osm)[, 1])
## [1] 0
The spatial distribution of cycle hire points in London based on official data (blue) and OpenStreetMap data (red).
Imagine that we need to join the capacity variable in cycle_hire_osm onto the official ‘target’ data contained in cycle_hire. This is when a non-overlapping join is needed. The simplest method is to use the topological operator st_is_within_distance() using a threshold distance of 20 m. Note that before performing the relation both datasets must be transformed into a projected CRS, saved as new objects denoted by the affix P (for projected) below:
cycle_hire_P = st_transform(cycle_hire, 27700)
cycle_hire_osm_P = st_transform(cycle_hire_osm, 27700)
sel = st_is_within_distance(cycle_hire_P, cycle_hire_osm_P, dist = 20)
summary(lengths(sel) > 0)
## Mode FALSE TRUE
## logical 304 438
This shows that there are 438 points in the target object cycle_hire_P within the threshold distance of cycle_hire_osm_P. How to retrieve the values associated with the respective cycle_hire_osm_P points? The solution is again with st_join(), but with an addition dist argument (set to 20 m below):
z = st_join(cycle_hire_P, cycle_hire_osm_P, st_is_within_distance, dist = 20)
nrow(cycle_hire)
## [1] 742
nrow(z)
## [1] 762
Note that the number of rows in the joined result is greater than the target. This is because some cycle hire stations in cycle_hire_P have multiple matches in cycle_hire_osm_P. To aggregate the values for the overlapping points and return the mean, we can use the aggregation methods learned in last session, resulting in an object with the same number of rows as the target:
z = z %>%
group_by(id) %>%
summarize(capacity = mean(capacity))
nrow(z) == nrow(cycle_hire)
## [1] TRUE
The capacity of nearby stations can be verified by comparing a plot of the capacity of the source cycle_hire_osm data with the results in this new object (plots not shown):
plot(cycle_hire_osm["capacity"])
plot(z["capacity"])
The result of this join has used a spatial operation to change the attribute data associated with simple features but the geometry associated with each feature has remained unchanged.
Like attribute data aggregation, reviewed in the last week, spatial data aggregation can be a way of condensing data. Aggregated data show some statistics about a variable (typically average or total) in relation to some kind of grouping variable. This section demonstrates how similar functions work using spatial grouping variables.
Returning to the example of New Zealand, imagine you want to find out the average height of high points in each region. This is a good example of spatial aggregation: it is the geometry of the source (y or nz in this case) that defines how values in the target object (x or nz_height) are grouped. This is illustrated using the base aggregate() function below:
nz_avheight = aggregate(x = nz_height, by = nz, FUN = mean)
nz_avheight
## Simple feature collection with 16 features and 2 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 1090144 ymin: 4748537 xmax: 2089533 ymax: 6191874
## epsg (SRID): 2193
## proj4string: +proj=tmerc +lat_0=0 +lon_0=173 +k=0.9996 +x_0=1600000 +y_0=10000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## First 10 features:
## t50_fid elevation geometry
## 1 NA NA MULTIPOLYGON (((1745493 600...
## 2 NA NA MULTIPOLYGON (((1803822 590...
## 3 2408405 2734.333 MULTIPOLYGON (((1860345 585...
## 4 NA NA MULTIPOLYGON (((2049387 583...
## 5 NA NA MULTIPOLYGON (((2024489 567...
## 6 NA NA MULTIPOLYGON (((2024489 567...
## 7 NA NA MULTIPOLYGON (((1740438 571...
## 8 2408394 2777.000 MULTIPOLYGON (((1866732 566...
## 9 NA NA MULTIPOLYGON (((1881590 548...
## 10 2368390 2889.455 MULTIPOLYGON (((1557042 531...
The result of the previous command is an sf object with the same geometry as the (spatial) aggregating object (nz).1 The result of the previous operation is illustrated in Figure @ref(fig:spatial-aggregation). The same result can also be generated using the ‘tidy’ functions group_by() and summarize() (used in combination with st_join()):
Average height of the top 101 high points across the regions of New Zealand.
nz_avheight2 = nz %>%
st_join(nz_height) %>%
group_by(Name) %>%
summarize(elevation = mean(elevation, na.rm = TRUE))
print(nz_avheight2)
## Simple feature collection with 16 features and 2 fields
## geometry type: GEOMETRY
## dimension: XY
## bbox: xmin: 1090144 ymin: 4748537 xmax: 2089533 ymax: 6191874
## epsg (SRID): 2193
## proj4string: +proj=tmerc +lat_0=0 +lon_0=173 +k=0.9996 +x_0=1600000 +y_0=10000000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## # A tibble: 16 x 3
## Name elevation geom
## <chr> <dbl> <GEOMETRY [m]>
## 1 Auckland NaN MULTIPOLYGON (((1815954 6007878, 1818016 599900…
## 2 Bay of Plen… NaN POLYGON ((2049387 5832785, 2051016 5826423, 204…
## 3 Canterbury 2995. POLYGON ((1686902 5353233, 1679996 5344809, 167…
## 4 Gisborne NaN POLYGON ((2024489 5674920, 2019037 5677334, 201…
## 5 Hawke's Bay NaN POLYGON ((2024489 5674920, 2024126 5663676, 203…
## 6 Manawatu-Wa… 2777 POLYGON ((1866732 5664323, 1868949 5654440, 186…
## 7 Marlborough 2720 MULTIPOLYGON (((1686902 5353233, 1679241 535947…
## 8 Nelson NaN POLYGON ((1624866 5417556, 1616643 5424521, 161…
## 9 Northland NaN POLYGON ((1745493 6001802, 1740539 5995066, 173…
## 10 Otago 2825 POLYGON ((1335205 5126878, 1336956 5118634, 132…
## 11 Southland 2723 MULTIPOLYGON (((1207704 4817130, 1216155 481326…
## 12 Taranaki NaN POLYGON ((1740438 5714538, 1743867 5711520, 175…
## 13 Tasman NaN POLYGON ((1616643 5424521, 1624866 5417556, 162…
## 14 Waikato 2734. POLYGON ((1860345 5859665, 1857808 5853929, 185…
## 15 Wellington NaN POLYGON ((1881590 5489434, 1875693 5479987, 187…
## 16 West Coast 2889. POLYGON ((1557042 5319333, 1554239 5309440, 154…
The resulting nz_avheight objects have the same geometry as the aggregating object nz but with a new column representing the mean average height of points within each region of New Zealand (other summary functions such as median() and sd() can be used in place of mean()). Note that regions containing no points have an associated elevation value of NA. For aggregating operations which also create new geometries, see section @ref(geometry-unions).
Spatial congruence is an important concept related to spatial aggregation. An aggregating object (which we will refer to as y) is congruent with the target object (x) if the two objects have shared borders. Often this is the case for administrative boundary data, whereby the larger units (e.g. Middle Layer Super Output Areas in the UK or districts in many other European countries) are composed of many smaller units (Output Areas in the UK, see ons.gov.uk for further details or municipalities in many other European countries).
While topological relations are binary — a feature either intersects with another or does not — distance relations are continuous. The distance between two objects is calculated with the st_distance() function. This is illustrated in the code chunk below, which finds the distance between the highest point in New Zealand and the geographic centroid of the Canterbury region, created in section @ref(spatial-subsetting):
nz_heighest = nz_height %>% top_n(n = 1, wt = elevation)
canterbury_centroid = st_centroid(canterbury)
st_distance(nz_heighest, canterbury_centroid)
## Units: [m]
## [,1]
## [1,] 115540
There are two potentially surprising things about the result:
units, telling us the distance is 115,540 meters, not 100,000 inches, or any other measure of distance.This second feature hints at another useful feature of st_distance(), its ability to return distance matrices between all combinations of features in objects x and y. This is illustrated in the command below, which finds the distances between the first three features in nz_height and the Otago and Canterbury regions of New Zealand represented by the object co.
co = filter(nz, grepl("Canter|Otag", Name))
st_distance(nz_height[1:3, ], co)
## Units: [m]
## [,1] [,2]
## [1,] 123537.16 15497.72
## [2,] 94282.77 0.00
## [3,] 93018.56 0.00
Note that the distance between the second and third feature in nz_height and the second feature in co is zero. This demonstrates the fact that distances between points and polygons refer to the distance to any part of the polygon: The second and third points in nz_height are in Otago, which can be verified by plotting them (result not shown):
plot(st_geometry(co)[2])
plot(st_geometry(nz_height)[2:3], add = TRUE)
This section builds on section @ref(manipulating-raster-objects), which highlights various basic methods for manipulating raster datasets, to demonstrate more advanced and explicitly spatial raster operations, and uses the objects elev and grain manually created in section @ref(manipulating-raster-objects). For the reader’s convenience, these datasets can be also found in the spData package.
Last week, we showed how to retrieve values associated with specific cell IDs or row and column combinations. Raster objects can also be extracted by location (coordinates) and other spatial objects. To use coordinates for subsetting, one can ‘translate’ the coordinates into a cell ID with the raster function cellFromXY(). An alternative is to use raster::extract() (be careful, there is also a function called extract() in the tidyverse) to extract values. Both methods are demonstrated below to find the value of the cell that covers a point located 0.1 units from the origin.
id = cellFromXY(elev, xy = c(0.1, 0.1))
elev[id]
##
## 16
# the same as
raster::extract(elev, data.frame(x = 0.1, y = 0.1))
##
## 16
It is convenient that both functions also accept objects of class Spatial* Objects. Raster objects can also be subset with another raster object, as illustrated in Figure @ref(fig:raster-subset) (left panel) and demonstrated in the code chunk below:
clip = raster(nrows = 3, ncols = 3, res = 0.3, xmn = 0.9, xmx = 1.8,
ymn = -0.45, ymx = 0.45, vals = rep(1, 9))
print(clip)
## class : RasterLayer
## dimensions : 3, 3, 9 (nrow, ncol, ncell)
## resolution : 0.3, 0.3 (x, y)
## extent : 0.9, 1.8, -0.45, 0.45 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
## data source : in memory
## names : layer
## values : 1, 1 (min, max)
# we can also use extract
# extract(elev, extent(clip))
par(mfrow = c(1,2))
em = merge(extent(elev),extent(clip))
plot(em, type="n")
plot(elev,add=TRUE, legend=FALSE)
plot(clip, add=TRUE, legend=FALSE)
plot(em, type="n")
plot(elev)
oneclip <- elev[clip, drop = FALSE]
print(oneclip)
## class : RasterLayer
## dimensions : 2, 1, 2 (nrow, ncol, ncell)
## resolution : 0.5, 0.5 (x, y)
## extent : 1, 1.5, -0.5, 0.5 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
## data source : in memory
## names : layer
## values : 18, 24 (min, max)
par(mfrow = c(1,2))
em = merge(extent(elev),extent(clip))
plot(em, type="n")
plot(elev,add=TRUE, legend=FALSE)
plot(clip, add=TRUE, legend=FALSE)
plot(em, type="n")
plot(oneclip, add=TRUE)
Another common use case of spatial subsetting is when a raster with logical (or NA) values is used to mask another raster with the same extent and resolution, as illustrated in Figure @ref(fig:raster-subset), middle and right panel. In this case, the [, mask() and overlay() functions can be used (results not shown):
# create raster mask
rmask = elev
values(rmask) = sample(c(NA, TRUE), 36, replace = TRUE)
par(mfrow = c(1,2))
em = merge(extent(elev),extent(rmask))
plot(em, main="elev",type="n")
plot(elev,add=TRUE, legend=FALSE)
plot(em, main="mask",type="n")
plot(rmask, add=TRUE, legend=FALSE)
# spatial subsetting
nelev <- elev[rmask, drop = FALSE] # with [ operator
par(mfrow = c(1,2))
plot(em, main="elev",type="n")
plot(elev,add=TRUE, legend=FALSE)
plot(em, main="masked elevation",type="n")
plot(nelev, add=TRUE, legend=FALSE)
nelev2 <- mask(elev, rmask) # with mask()
par(mfrow = c(1,2))
plot(em, main="elev",type="n")
plot(elev,add=TRUE, legend=FALSE)
plot(em, main="a new masked elevation",type="n")
plot(nelev2, add=TRUE, legend=FALSE)
nelev3 <- overlay(elev, rmask, fun = "max") # with overlay
par(mfrow = c(1,2))
plot(em, main="elev",type="n")
plot(elev,add=TRUE, legend=FALSE)
plot(em, main="masked elevation",type="n")
plot(nelev3, add=TRUE, legend=FALSE)
In the code chunk above, we have created a mask object called rmask with values randomly assigned to NA and TRUE. Next we only want to keep those values of elev which are TRUE in rmask, or expressed differently, we want to mask elev with rmask. These operations are in fact Boolean local operations since we compare cell-wise two rasters. The next subsection explores these and related operations in more detail.
Map algebra makes raster processing really fast. This is because raster datasets only implicitly store coordinates. To derive the coordinate of a specific cell, we have to calculate it using its matrix position and the raster resolution and origin. For the processing, however, the geographic position of a cell is barely relevant as long as we make sure that the cell position is still the same after the processing (one-to-one locational correspondence). Additionally, if two or more raster datasets share the same extent, projection and resolution, one could treat them as matrices for the processing. This is exactly what map algebra is doing in R. First, the raster package checks the headers of the rasters on which to perform any algebraic operation, and only if they are correspondent to each other, the processing goes on.2 And secondly, map algebra retains the so-called one-to-one locational correspondence. This is where it substantially differs from matrix algebra which changes positions when for example multiplying or dividing matrices.
Map algebra (or cartographic modeling) divides raster operations into four subclasses [@tomlin_geographic_1990], with each of them either working on one or several grids simultaneously:
This classification scheme uses basically the number of cells involved in a processing step as distinguishing feature. For the sake of completeness, we should mention that raster operations can also be classified by discipline such as terrain, hydrological analysis or image classification. The following sections explain how each type of map algebra operations can be used, with reference to worked examples (also see vignette("Raster") for a technical description of map algebra).
Local operations comprise all cell-by-cell operations in one or several layers. A good example is the classification of intervals of numeric values into groups such as grouping a digital elevation model into low (class 1), middle (class 2) and high elevations (class 3). Using the reclassify() command, we need first to construct a reclassification matrix, where the first column corresponds to the lower and the second column to the upper end of the class. The third column represents the new value for the specified ranges in column one and two. Here, we assign the raster values in the ranges 0–12, 12–24 and 24–36 are reclassified to take values 1, 2 and 3, respectively.
rcl = matrix(c(0, 12, 1, 12, 24, 2, 24, 36, 3), ncol = 3, byrow = TRUE)
print(rcl)
## [,1] [,2] [,3]
## [1,] 0 12 1
## [2,] 12 24 2
## [3,] 24 36 3
recl = reclassify(elev, rcl = rcl)
print(recl)
## class : RasterLayer
## dimensions : 6, 6, 36 (nrow, ncol, ncell)
## resolution : 0.5, 0.5 (x, y)
## extent : -1.5, 1.5, -1.5, 1.5 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
## data source : in memory
## names : layer
## values : 1, 3 (min, max)
par(mfrow = c(1,2))
plot(elev, main="original raster", legend=FALSE)
text(elev)
plot(recl, main="reclassified raster", legend=FALSE)
text(recl)
Raster algebra is another classical use case of local operations. This includes adding, subtracting and squaring two rasters. Raster algebra also allows logical operations such as finding all raster cells that are greater than a specific value (5 in our example below). The raster package supports all these operations and more, as described in vignette("Raster") and demonstrated below (results not show):
elev + elev
elev^2
log(elev)
elev > 5
Instead of arithmetic operators, one can also use the calc() and overlay() functions. These functions are more efficient, hence, they are preferable in the presence of large raster datasets. Additionally, they allow you to directly store an output file.
The calculation of the normalized difference vegetation index (NDVI) is one of the most famous local, i.e. pixel-by-pixel, raster operations. It ranges between - 1 and 1 with positive values indicating the presence of living plants (mostly > 0.2). To calculate the NDVI, one uses the red and near-infrared bands of remotely sensed imagery (e.g. Landsat or Sentinel imagery) exploiting the fact that vegetation absorbs light heavily in the visible light spectrum, and especially in the red channel, while reflecting it in the near-infrared spectrum.
\[ \begin{split} NDVI&= \frac{\text{NIR} - \text{Red}}{\text{NIR} + \text{Red}}\\ \end{split} \] where NIR refers to the near infrared channel and Red to the red channel.
Predictive mapping is another interesting application of local raster operations. The response variable corresponds to measured or observed points in space, for example, species richness, the presence of landslides, tree disease or crop yield. Consequently, we can easily retrieve space- or airborne predictor variables from various rasters (elevation, pH, precipitation, temperature, landcover, soil class, etc.). Subsequently, we model our response as a function of our predictors using lm, glm, gam or a machine-learning technique. ### Focal operations
While local functions operate on one cell, though possibly from multiple layers, focal operations take into account a central cell and its neighbors. The neighborhood (also named kernel, filter or moving window) under consideration is typically of size 3-by-3 cells (that is the central cell and its eight surrounding neighbors) but can take on any other (not necessarily rectangular) shape as defined by the user. A focal operation applies an aggregation function to all cells within the specified neighborhood, uses the corresponding output as the new value for the the central cell, and moves on to the next central cell (Figure @ref(fig:focal-example)). Other names for this operation are spatial filtering and convolution [@burrough_principles_2015].
In R, we can use the focal() function to perform spatial filtering. We define the shape of the moving window with a matrix whose values correspond to weights (see w parameter in the code chunk below). Secondly, the fun parameter lets us specify the function we wish to apply to this neighborhood. Here, we choose the minimum, but any other summary function, including sum(), mean(), or var() can be used.
r_focal = focal(elev, w = matrix(1, nrow = 3, ncol = 3), fun = min)
par(mfrow = c(1,2))
plot(elev, main="original raster", legend=FALSE)
text(elev)
plot(r_focal, main="filtered raster", legend=FALSE)
text(r_focal)
We can quickly check if the output meets our expectations.
In our example, the minimum value has to be always the upper left corner of the moving window (remember we have created the input raster by rowwise incrementing the cell values by one starting at the upper left corner). In this example, the weighting matrix consists only of 1s, meaning each cell has the same weight on the output, but this can be changed.
Focal functions or filters play a dominant role in image processing. Low-pass or smoothing filters use the mean function to remove extremes. In the case of categorical data, we can replace the mean with the mode, which is the most common value. By contrast, high-pass filters accentuate features. The line detection Laplace and Sobel filters might serve as an example here. Check the focal() help page for how to use them in R (this will also be used in the excercises at the end of this chapter).
Also, terrain processing uses heavily focal functions. Think, for instance, of the calculation of the slope, aspect and flow directions. The terrain() function lets you compute a few of these terrain characteristics but has not implemented all popular methods. For example, the Zevenbergen and Thorne method to compute the slope is missing.
Zonal operations are similar to focal operations. The difference is that zonal filters can take on any shape instead of just a predefined window. Our grain size raster is a good example (Figure @ref(fig:cont-raster)) because the different grain sizes are spread in an irregular fashion throughout the raster.
To find the mean elevation for each grain size class, we can use the zonal() command. This kind of operation is also known as zonal statistics in the GIS world.
z = zonal(elev, grain, fun = "mean") %>%
as.data.frame()
z
## zone mean
## 1 1 17.75
## 2 2 18.50
## 3 3 19.25
This returns the statistics for each category, here the mean altitude for each grain size class, and can be added to the attribute table of the ratified raster (see previous chapter).
Global operations are a special case of zonal operations with the entire raster dataset representing a single zone. The most common global operations are descriptive statistics for the entire raster dataset such as the minimum or maximum. Aside from that, global operations are also useful for the computation of distance and weight rasters. In the first case, one can calculate the distance from each cell to a specific target cell. For example, one might want to compute the distance to the nearest coast (see also raster::distance()). We might also want to consider topography, that means, we are not only interested in the pure distance but would like also to avoid the crossing of mountain ranges when going to the coast. To do so, we can weight the distance with elevation so that each additional altitudinal meter ‘prolongs’ the euclidean distance. Visibility and viewshed computations also belong to the family of global operations (in the exercises of Chapter @ref(gis) you will compute a viewshed raster).
Many map algebra operations have a counterpart in vector processing [@liu_essential_2009]. Computing a distance raster (zonal operation) while only considering a maximum distance (logical focal operation) is the equivalent to a vector buffer operation (section @ref(clipping)). Reclassifying raster data (either local or zonal function depending on the input) is equivalent to dissolving vector data. Overlaying two rasters (local operation), where one contains NULL or NA values representing a mask, is similar to vector clipping. Quite similar to spatial clipping is intersecting two layers. The difference is that these two layers (vector or raster) simply share an overlapping area (see Figure @ref(fig:venn-clip) for an example). However, be careful with the wording. Sometimes the same words have slightly different meanings for raster and vector data models. Aggregating in the case of vector data refers to dissolving polygons while it means increasing the resolution in the case of raster data. In fact, one could see dissolving or aggregating polygons as decreasing the resolution. However, zonal operations might be the better raster equivalent compared to changing the cell resolution. Zonal operations can dissolve the cells of one raster in accordance with the zones (categories) of another raster using an aggregation function (see above).
Suppose we would like to compute the NDVI, and additionally want to compute terrain attributes from elevation data for observations within a study area. Before such computations we would have to acquire airborne or remotely sensed information. The corresponding imagery is often divided into scenes covering a specific spatial extent. Frequently, a study area covers more than one scene. In these cases we would like to merge the scenes covered by our study area. In the easiest case, we can just merge these scenes, that is put them side to side. This is possible with digital elevation data (SRTM, ASTER). In the following code chunk we first download the SRTM elevation data for Austria and Switzerland (for the country codes have a look at ccodes()). In a second step, we merge the two rasters into one.
aut = getData("alt", country = "AUT", mask = TRUE)
ch = getData("alt", country = "CHE", mask = TRUE)
aut_ch = merge(aut, ch)
print(aut_ch)
## class : RasterLayer
## dimensions : 408, 1368, 558144 (nrow, ncol, ncell)
## resolution : 0.008333333, 0.008333333 (x, y)
## extent : 5.9, 17.3, 45.7, 49.1 (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
## data source : in memory
## names : layer
## values : 108, 4393 (min, max)
Raster’s merge() command combines two raster layers, and in case they overlap, it uses the value of the first raster. You can do exactly the same with gdalUtils::mosaic_rasters() which is faster, and therefore recommended if you have to merge a multitude of large rasters stored on disk.
The merging approach is of little use when the overlapping values do not correspond to each other. This is frequently the case when you want to combine spectral imagery from scenes that were taken on different dates. The merge() command will still work but you will see a clear border in the resulting image. The mosaic() command lets you define a function for the overlapping area. For instance, we could compute the mean value. This might smooth the clear border in the merged result but it will most likely not make it disappear. To do so, we need a more advanced approach. Remote sensing scientists frequently apply histogram matching or use regression techniques to align the values of the first image with those of the second image. The packages landsat (histmatch(), relnorm(), PIF()), satellite (calcHistMatch()) and RStoolbox (histMatch(), pifMatch()) provide the corresponding functions. For a more detailed introduction on how to use R for remote sensing, we refer the reader to @wegmann_remote_2016.
This can be verified with identical(st_geometry(nz), st_geometry(nz_avheight)).↩
Map algebra operations are also possible with headerless rasters but in this case the user has to make sure that in fact there exists a one-to-one locational correspondence. An example showing how to import a headerless raster into R is provided in a post at https://stat.ethz.ch/pipermail/r-sig-geo/2013-May/018278.html.↩